CUDA 编程指南：体系结构范式：冯·诺依曼与哈佛架构对比

计算系统的根本设计由处理单元与内存之间的关系决定。主要区别在于指令和数据是共享同一通道，还是使用独立的传输路径。

被通用系统如 x86-64所采用的模型具有统一的内存空间。中央处理器通过单一总线访问代码和数据，从而产生 冯·诺依曼瓶颈：当中央处理器必须在获取指令和访问操作数之间切换总线时所产生的延迟。

常见于专用处理器和 ARMv8-A L1缓存实现中，该设计采用物理上分离的内存存储和信号通路。这使得操作码和数据操作数可以同时被获取，显著提升吞吐量。

流程图：冯·诺依曼架构中的内存取指周期，显示总线按顺序使用的状况。

现代高性能计算系统通常采用 改进型哈佛架构。它们在L1缓存级别（分离的指令缓存和数据缓存）表现得像哈佛机器，以最大化速度，同时在主内存层面保持冯·诺依曼模型，以保证编程灵活性。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the defining characteristic of the von Neumann Bottleneck?

The CPU speed is slower than the bus speed.

A single bus must alternate between fetching code and accessing data.

The memory capacity is too small for modern code.

The L1 cache and L2 cache use different voltages.

QUESTION 2

Which architecture is typically used for L1 cache implementations in ARMv8-A?

Pure von Neumann

Harvard Architecture

Stack-based Architecture

Single-Bus CISC

QUESTION 3

In a Modified Harvard Architecture, where does the 'von Neumann' aspect usually reside?

At the L1 Cache level

At the Main RAM/Global Memory level

Inside the Arithmetic Logic Unit

In the register file

QUESTION 4

What advantage does a von Neumann architecture provide to Just-In-Time (JIT) compilers?

It prevents memory fragmentation.

It treats written instructions exactly like data variables.

It allows for higher clock frequencies.

It automatically encrypts memory.

QUESTION 5

How many clock cycles are minimally required to fetch one instruction and one data operand in a pure Harvard architecture?

One cycle (Simultaneous fetch)

Two cycles (Sequential fetch)

Four cycles (Multiplexed fetch)

Zero cycles (Pre-cached)